MULTIP[S83,JMC] - www.SailDart.org

perm filename MULTIP[S83,JMC] blob sn#717191 filedate 1983-06-18 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	RUN-TIME CONTROL OF MULTI-PROCESSING ON THE S-1 BY THE USER PROGRAM
C00016 ENDMK
C⊗;
RUN-TIME CONTROL OF MULTI-PROCESSING ON THE S-1 BY THE USER PROGRAM


	In my opinion, a multiple-control multiple processor computer
like the S-1 is the best bet for getting good use of multiple processors
on a single problem.  However, it isn't going to be easy, and several
approaches may have to be tried and compared.  Indeed it seems likely
to me that several approaches will actually survive.  The reason is that
different combinations of emphasis on specializing the whole software
to a particular application and maintaining general capability will be
needed depending on the application and the extent to which
a particular S-1 system is dedicated to a particular application.

	There are several ways of arranging for several processors to
work on a single problem.  Many of them involve putting the program
into a form that specifies how many subtasks each task is to be
divided into.  For example, a statement

	for i := 1 step 1 until 1000 do A

may be divided by a compiler into 16 tasks each involving 62+ values of i.
Erik Gilbert's thesis work is along this line.

	The present note pursues a different approach; the multi-processing
is controlled by the user program itself using queues of tasks.
In this approach the above statement would be written

	pfor[qlet] i := 1 step 1 until 1000 do A

where  qlet  is a boolean expression.  This statement is compiled
into code that behaves as follows.  First the Boolean expression
qlet is evaluated.  If  qlet  is false, then parallelism is not allowed
in this execution, and the  pfor  statement is executed as an ordinary
for  statement on the processor that encountered it.  
If  qlet  evaluates to  true, then the processor executing the statement
executes a CALL (either to the system or to a part of the user's
own program) that queues a task corresponding to

	pfor[qlet] i := 2 step 1 until 1000 do A

and then starts to execute the task  A  for  i=1.
Interpreter based systems can operate similarly.

	This makes the decision of whether a task is to be divided
into subtasks a run-time decision that can depend on the state of
the computation at a given time.  This is important, because
parallelism costs overhead and is subject to bugs.
It is most valuable when it is desirable to complete a task in
minimum time, and several processors are available.  A given
program, for example a matrix subroutine, may be executed with different
arguments.  Sometimes the computation is small or there aren't processors
available, and it is better not to attempt parallelism.  Sometimes
it is better to debug the algorithm without parallelism before
attempting parallelism.  We envisage that the actual parameter  qlet
may include facts about the state of the operating system, facts about
the values of the variables, and variables that are set according
to whether the user wants to allow parallelism this time.
Of course, the programmer will often treat the  qlet  parameter
in many, perhaps most, occurrences of  forp  more simply.  He may
set it to  true  leaving parallelism up to the queuing program, to
false, forbidding parallelism, or he may give it some single global
parameter.  User program control of parallelism may be especially
important for certain real-time programs.

	Other constructions may also permit parallelism and have
qlet  parameters.  The following seem sufficient:

	1. forp  as described above

	2. multival[qlet,e1,...,en].  If  qlet  is true, this
will queue the tasks of evaluating  e2,...,en  and start evaluating
e1.  multival  then serves as  n  arguments of a function or subroutine.
For example, suppose we have a function definition

	f(x,y) = sqrt(x**2 + y**2).

Then the expression  f(multival[qlet,e1,e2])  has the same value
as  f(e1,e2)  but permits  e1  and  e2  to be evaluated in parallel
if  qlet  is true.  If  qlet  is false, then evaluation of arguments
must be from left to right which permits expressions with side-effects
affecting the values of other expressions.  Composite expressions
like  multival[qlet1,multival[qlet2,x,y],z]  can be used to enforce
combinations of sequentiality and parallelism.

	In LISP-like languages permitting lambda expressions, there
will also be a construction

	qlambda[qlet,(x ... z) e]

that permits the arguments of the lambda-expression to be evaluated
in parallel.

	3. goto[qlet,foo] l1,...,ln, where  foo  is an identifier
  is a parallel  go to.  Its execution,
when  qlet  is true, queues a task corresponding to  l2,...,ln
and executes the statement  go to l1.
When a statement  join[foo]  is reached, the program checks
whether all the other tasks excited by the  go[qlet,foo]
are done.  If not the processor goes
back to the queuing program for another task.  If the others are
done, the processor continues the program from that point.

[1983 April 18 - A simple command might be  qgoto[qlet] l.  It
always continues to the next instruction but if  qlet  evaluates
to  true,  also sets up a task starting at  l.]

	Control of queuing requires special provisions in
both the run-time routines of the object program and in the
operating system.  The following considerations apply.

	1. The run-times need to tell the operating system how
many processors can be used if available.  However, perhaps the
answer will almost always be "as many as possible" or "more than
you've got".  The operating system needs to tell the run-times
how many are available now or are likely to be available in the
next few seconds (perhaps in the next few milliseconds).  It isn't
immediately clear what should be the time scale of multi-processing.

	2. The number of tasks queued needs to be controlled.
It need only be large enough to keep the available processors
busy.  There is sometimes a need to keep the processors working
on the same part of the data in order to keep down the amount
of data that has to be in memory at one time.  When this is
an important consideration, it may be necessary to interrupt
certain processes and recruit all available processors to work
close together so as to get done with the requirement for a certain
block of data.  This is best done by the user's program rather
than by the operating system.

	3. The most important form of parallelism occurs when tasks are
being ANDed, i.e. all of the parallel tasks are to be completed
before the computation can continue.  However, sometimes we have
an OR of tasks such as a parallel search for the solution to
a subproblem in which the first process to find the solution must
turn off the other searchers.  In general parallel search is to be
avoided, because if we can start with the most likely prospect,
serial searching will involve less expected total compute time
than parallel search.  However, parallel search may be justified
when real time is important.

	Moreover, we may have an "almost AND" of tasks.  This occurs when
the task are normally ANDed, but an error condition in one of the
tasks may require the suspension of the others.  More generally,
one would suppose that a THROW (MACLISP terminology) would result in the suspension
of all tasks subordinate to the CATCH that caught the THROW.
Alternatively, suspending the tasks could be an explicit action of
the program that activated by the CATCH.

	All this requires that the tasks must be subject to interrupts.
It would seem that any process should be able to generate an interrupt.
This would cause each process to go its interrupt routine which would
determine whether it would continue or would THROW to a suitable CATCH
statement.  The interrupt could then be a conditional THROW and could be a
parameter of a somewhat generalized CATCH.  In general, the THROW-CATCH
mechanism seems well suited to interrupt handling.

	4. The above considerations are vaguer than is desirable.
Certainly they don't constitute a specification.  More work would
refine them, but I don't think that it is possible to determine
without experience what is required for run-time control of parallelism.
On the other hand, I am convinced that run-time control is needed
to get the full benefit of the multi-processing hardware of the
S-1 or similar computers.  I believe that the facilities described
above will be a useful subset of what will eventually be needed
and will permit acquiring the necessary experience.

	5. The best vehicle for experiments may be the planned COMMON
LISP implementation on the S-1.  As stated above, instead of interpreting
the LISP primitives for multi-processing, the necessary multi-processing
primitives should be added.

	In conclusion, we assert that this approach to multi-processing
is one of the several that should be implemented.


John McCarthy
1982 February 5

[This document is MULTIP[1,JMC] at S1-A].